This is an interactive notebook. You can run it locally or use the links below:
Summarization using Chain of Density
Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave for tracking and evaluating the application.
What is Chain of Density Summarization?
- Starting with an initial summary
- Iteratively refining the summary, making it more concise while preserving key information
- Increasing the density of entities and technical details with each iteration
Why use Weave?
In this tutorial, we’ll use Weave to implement and evaluate a Chain of Density summarization pipeline for ArXiv papers. You’ll learn how to:- Track your LLM pipeline: Use Weave to automatically log inputs, outputs, and intermediate steps of your summarization process.
- Evaluate LLM outputs: Create rigorous, apples-to-apples evaluations of your summaries using Weave’s built-in tools.
- Build composable operations: Combine and reuse Weave operations across different parts of your summarization pipeline.
- Integrate seamlessly: Add Weave to your existing Python code with minimal overhead.
Set up the environment
First, let’s set up our environment and import the necessary libraries:To get an Anthropic API key:
- Sign up for an account at https://www.anthropic.com
- Navigate to the API section in your account settings
- Generate a new API key
- Store the API key securely in your .env file
weave.init(<project name>)
call sets up a new Weave project for our summarization task.
Define the ArxivPaper model
We’ll create a simpleArxivPaper
class to represent our data:
Load PDF content
To work with the full paper content, we’ll add a function to load and extract text from PDFs:Implement Chain of Density summarization
Now, let’s implement the core CoD summarization logic using Weave operations:
summarize_current_summary
: Generates a single summary iteration based on the current state.iterative_density_summarization
: Applies the CoD technique by callingsummarize_current_summary
multiple times.chain_of_density_summarization
: Orchestrates the entire summarization process and returns the results.
@weave.op()
decorators, we ensure that Weave tracks the inputs, outputs, and execution of these functions.
Create a Weave Model
Now, let’s wrap our summarization pipeline in a Weave Model:
ArxivChainOfDensityPipeline
class encapsulates our summarization logic as a Weave Model, providing several key benefits:
- Automatic experiment tracking: Weave captures inputs, outputs, and parameters for each run of the model.
- Versioning: Changes to the model’s attributes or code are automatically versioned, creating a clear history of how your summarization pipeline evolves over time.
- Reproducibility: The versioning and tracking make it easy to reproduce any previous result or configuration of your summarization pipeline.
- Hyperparameter management: Model attributes (like
model
anddensity_iterations
) are clearly defined and tracked across different runs, facilitating experimentation. - Integration with Weave ecosystem: Using
weave.Model
allows seamless integration with other Weave tools, such as evaluations and serving capabilities.
Implement evaluation metrics
To assess the quality of our summaries, we’ll implement simple evaluation metrics:Create a Weave Dataset and run evaluation
To evaluate our pipeline, we’ll create a Weave Dataset and run an evaluation:

Conclusion
In this example, we’ve demonstrated how to implement a Chain of Density summarization pipeline for ArXiv papers using Weave. We’ve shown how to:- Create Weave operations for each step of the summarization process
- Wrap the pipeline in a Weave Model for easy tracking and evaluation
- Implement custom evaluation metrics using Weave operations
- Create a dataset and run an evaluation of the pipeline